Goto

Collaborating Authors

 improved precision and recall metric



Improved Precision and Recall Metric for Assessing Generative Models

Neural Information Processing Systems

The ability to automatically estimate the quality and coverage of the samples produced by a generative model is a vital requirement for driving algorithm research. We present an evaluation metric that can separately and reliably measure both of these aspects in image generation tasks by forming explicit, non-parametric representations of the manifolds of real and generated data. We demonstrate the effectiveness of our metric in StyleGAN and BigGAN by providing several illustrative examples where existing metrics yield uninformative or contradictory results. Furthermore, we analyze multiple design variants of StyleGAN to better understand the relationships between the model architecture, training methods, and the properties of the resulting sample distribution. In the process, we identify new variants that improve the state-of-the-art. We also perform the first principled analysis of truncation methods and identify an improved method. Finally, we extend our metric to estimate the perceptual quality of individual samples, and use this to study latent space interpolations.



Reviews: Improved Precision and Recall Metric for Assessing Generative Models

Neural Information Processing Systems

Originality: This paper uses similar intuition as [1]. Precision should represent the generated images captured by real images and the recall should represent the real images should be captured by generated images. Instead of using PR curve, the authors use two values definition as information retrieval metric and claim it is better by showing counterexample in StyleGAN with truncation trick. I think the main contribution is the empirical evaluations on large-scale GANs. They evaluated StyleGAN and BigGAN and show the tradeoff between precision and recall by controlling the truncation trick.


Reviews: Improved Precision and Recall Metric for Assessing Generative Models

Neural Information Processing Systems

This paper proposes a new metric for mode collapse, which is a scalar value that can be read off from previously proposed measure of mode collapse in PacGAN. Precisely, in the mode collapse region, one can read the two points: (i) where the mode collapse region touches vertical axis ( \delta -axis) and (ii) where the mode collapse r region touches \delta 1 line. Each one is exactly the same as P_r(support{P_g}) and P_g(support{P_r}) that defend the proposed scalar valued mode collapse measure. This should be explained precisely in the paper, as (i) PacGAN introduced a proper mathematical notion of mode collapse earlier, (ii) the mode collapse region strictly generalizes the proposed metric (iii) mode collapse regions is the foundation of understanding mode collapse theoretically. A new estimator based on nearest neighbor distances are proposed, with extensive numerical validation of the proposed metric.


Improved Precision and Recall Metric for Assessing Generative Models

Neural Information Processing Systems

The ability to automatically estimate the quality and coverage of the samples produced by a generative model is a vital requirement for driving algorithm research. We present an evaluation metric that can separately and reliably measure both of these aspects in image generation tasks by forming explicit, non-parametric representations of the manifolds of real and generated data. We demonstrate the effectiveness of our metric in StyleGAN and BigGAN by providing several illustrative examples where existing metrics yield uninformative or contradictory results. Furthermore, we analyze multiple design variants of StyleGAN to better understand the relationships between the model architecture, training methods, and the properties of the resulting sample distribution. In the process, we identify new variants that improve the state-of-the-art.


Improved Precision and Recall Metric for Assessing Generative Models

Kynkäänniemi, Tuomas, Karras, Tero, Laine, Samuli, Lehtinen, Jaakko, Aila, Timo

Neural Information Processing Systems

The ability to automatically estimate the quality and coverage of the samples produced by a generative model is a vital requirement for driving algorithm research. We present an evaluation metric that can separately and reliably measure both of these aspects in image generation tasks by forming explicit, non-parametric representations of the manifolds of real and generated data. We demonstrate the effectiveness of our metric in StyleGAN and BigGAN by providing several illustrative examples where existing metrics yield uninformative or contradictory results. Furthermore, we analyze multiple design variants of StyleGAN to better understand the relationships between the model architecture, training methods, and the properties of the resulting sample distribution. In the process, we identify new variants that improve the state-of-the-art. We also perform the first principled analysis of truncation methods and identify an improved method.